Search CORE

arXiv.org e-Print Archive

Computing Integer Powers in Floating-Point Arithmetic

Author: Kornerup Peter
Lefèvre Vincent
Muller Jean-Michel
Publication venue
Publication date: 30/05/2007
Field of study

We introduce two algorithms for accurately evaluating powers to a positive integer in floating-point arithmetic, assuming a fused multiply-add (fma) instruction is available. We show that our log-time algorithm always produce faithfully-rounded results, discuss the possibility of getting correctly rounded results, and show that results correctly rounded in double precision can be obtained if extended-precision is available with the possibility to round into double precision (with a single rounding).Comment: Laboratoire LIP : CNRS/ENS Lyon/INRIA/Universit\'e Lyon

Crossref

On the error of Computing ab + cd using Cornea, Harrison and Tang's method

Author: Muller Jean-Michel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

International audienceIn their book, Scientific Computing on the Itanium, Cornea et al. [2002] introduce an accurate algorithm for evaluating expressions of the form ab + cd in binary floating-point arithmetic, assuming an FMA instruction is available. They show that if p is the precision of the floating-point format and if u = 2^{−p}, the relative error of the result is of order u. We improve their proof to show that the relative error is bounded by 2u + 7u^2 + 6u^3. Furthermore, by building an example for which the relative error is asymptotically (as p → ∞ or, equivalently, as u → 0) equivalent to 2u, we show that our error bound is asymptotically optimal

Generating function approximations at compile time

Author: Muller Jean-Michel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

ISBN : 12-4244-0785-0 ISSN: 1058-6393International audienceUsually, the mathematical functions used in a numerical programs are decomposed into elementary functions (such as sine, cosine, exponential, logarithm...), and for each of these functions, we use a program from a library. This may have some drawbacks: first in frequent cases, it is a compound function (e.g. log(1 + exp(−x))) that is needed, so that directly building a polynomial or rational approximation for that function (instead of decomposing it) would result in a faster and/or more accurate calculation. Also, at compile-time, we might have some information (e.g., on the range of the input value) that could help to simplify the program. We investigate the possibility of directly building accurate approximations at compile-time

Southampton (e-Prints Soton)

Crossref

OpenGrey Repository

Vers des primitives propres en arithmétique des ordinateurs

Author: Muller Jean-Michel
Publication venue: HAL CCSD
Publication date: 01/09/1999
Field of study

La norme IEEE-754 consacrée à l'arithmétique virgule flottante spécifie le comportement des quatre opérations arithmétiques. Une spécification des fonctions élémentaires devrait voir le jour dans les années à venir. On s'intéresse dans cet article aux avantages que l'on peut tirer d'un système dont les «primitives numériques» sont complètement spécifiées

PORTO Publications Open Repository TOrino

Guest Editors' Introduction:Special Section on Computer Arithmetic

Author: Kornerup Peter
Montuschi Paolo
Muller Jean-Michel
Schwarz Eric
Publication venue
Publication date: 01/01/2009
Field of study

Avoiding double roundings in scaled Newton-Raphson division

Author: Jean-Michel Muller
Jean-Michel Muller
Publication venue
Publication date: 12/02/2020
Field of study

Abstract-When performing divisions using Newton-Raphson (or similar) iterations on a processor with a floating-point fused multiply-add instruction, one must sometimes scale the iterations, to avoid over/underflow and/or loss of accuracy. This may lead to double-roundings, resulting in output values that may not be correctly rounded when the quotient falls in the subnormal range. We show how to avoid this problem

CiteSeerX

Solving Systems of Linear Equations in Complex Domain : Complex E-Method

Author: Ercegovac Milos
Muller Jean-Michel
Publication venue: HAL CCSD
Publication date: 24/01/2007
Field of study

The E-method, introduced by Ercegovac, allows efficient parallel solution of diagonally dominant systems of linear equations in real domain using simple and highly regular hardware. Since the evaluation of polynomials and certain rational functions can be achieved by solving the corresponding linear systems, the E-method is an attractive general approach for function evaluation. We generalize the E-method to complex linear systems, and show some potential applications such as the evaluation of complex polynomials and rational functions